Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

Targeted Gene Metagenomic Data Analysis ◾ 299

metrics. First, we need to normalize the read counts across sample to adjust for any bias

arising from the different sequence depths and to make the comparison meaningful. The

normalization is performed by rarefying the count of feature table to a user-specified depth.

The lowest read count can be chosen as the user-defined depth. The lowest number of reads

is determined from a summary created from the feature table. The lowest count number

is then provided to the “--p-sampling-depth” parameter of the “diversity” plugin as a sam-

pling depth for all samples. Once the plugin command is executed, samples are drawn

without replacement so that each sample in the resulting table will have a total count equal

to that of sampling depth. Then, the alpha and beta metrics are computed. The following

script creates summary from the feature table to determine the lowest read number:

qiime feature-table summarize \

--i-table dada2/table_feat_sample_freq_filtered_yoga_dada2.qza \

--o-visualization dada2/table_feat_sample_freq_filtered_yoga_

dada2.qzv \

--m-sample-metadata-file data/sample-metadata.tsv

qiime tools view dada2/table_feat_sample_freq_filtered_yoga_dada2.qzv

When we study the summary, we can observe that the lowest read number for the samples

is 955 sequences. So, we can set the --p-sampling-depth parameter to 955. This step will

sub-sample the counts in each sample without replacement so that each sample in the

resulting table will have a total count of 955.

The “diversity” plugin requires a phylogenetic tree and feature table artifacts and the

sample metadata file as inputs and it outputs the alpha and beta diversity metrics saved into

the specified output directory.

qiime diversity core-metrics-phylogenetic \

--i-phylogeny trees2/rooted-tree-yoga_dada2.qza \

--i-table dada2/table_feat_sample_freq_filtered_yoga_dada2.qza \

--p-sampling-depth 955 \

--m-metadata-file data/sample-metadata.tsv \

--output-dir diversity-indices

The metrics would be saved to the output directory. We can use that metric to explore

the microbial composition of sample in the context of the grouping defined in the sample

metadata.

We will test for associations between categorical metadata columns and alpha diversity

data. We will do that here for the Faith Phylogenetic Diversity (a measure of community

richness) and Shannon diversity. The following commands will test for significant differ-

ences in the alpha diversity measures of samples:

qiime diversity alpha-group-significance \

--i-alpha-diversity diversity-indices/faith_pd_vector.qza \

--m-metadata-file data/sample-metadata.tsv \

--o-visualization diversity-indices/faith-pd-group-significance.qzv